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Abstract 

On the Web, visits of a page are often introduced by one or 
more valuable linking sources. Indeed, good back links are 
valuable resources for Web pages and sites. We propose to 
discovering and leveraging the best backlinks of pages for 
ranking. Similar to PageRank, MaxRank scores are updated 
recursively. In particular, with probability A, the MaxRank 
of a document is updated from the backlink source with 
the maximum score; with probability 1 — A, the MaxRank 
of a document is updated from a random backlink source. 
MaxRank has an interesting relation to PageRank. When 
A = 0, MaxRank reduces to PageRank; when A 1, 
MaxRank only looks at the best backlink it thinks. Empirical 
results on Wikipedia shows that the global authorities are 
very influential; Overall large As (but smaller than 1) perform 
best: the convergence is dramatically faster than PageRank, 
but the performance is still comparable. We study the 
influence of these sources and propose a few measures such 
as the times of being the best backlink for others, and related 
properties of the proposed algorithm. The introduction of 
best backlink sources provides new insights for link analysis. 
Besides ranking, our method can be used to discover the 
most valuable linking sources for a page or Website, which 
is useful for both search engines and site owners. 

1 Introduction 

The gigantic size and diverse content of modern databases 
have made ranking algorithms fundamental components of 
search systems fit]. The link analysis approach to ranking 
has been proven to be very effective in evaluating the quali- 
ties of Webpages [20, 25], with widely practice from indus- 
try and intensive studies from academics. The success has 
proven that the hyperlinks on the Web are useful in finding 
high quality sources, which is hard based only on the con- 
tent of pages. PageRank and HITS are two seminal algo- 
rithms in literature. PageRank finds authorities which are 
the pages frequently visited by a random surfer. HITS finds 
both authorities and hubs, which are defined recursively — 
the authorities are frequently linked by the hubs which turn 
out to be the pages frequently linked by authorities. In this 
paper, we will be focused on finding authorities in the spirit 
of PageRank, though our techniques may also apply to HITS 



and other link analysis algorithms. 

PageRank. In PageRank formulation, with probability c, 
a random surfer model follows the links on a page uniformly 
at random, and with probability 1 — c, the surfer model 
jumps to a new page selected uniformly at random from the 
database. The PageRank value of a page is defined as the 
probability of visiting the page in the long run of the random 
walk, e.g., see IHIlllllllSllll- 

Suppose there are N documents in the database. All 
vectors are column vectors. The transpose of a matrix X is 
denoted by X'^ . We need the following notations. 

L be an adjacency matrix of the database. That is, 
L{i, j) = 1 if there is a link from document i to document j, 
otherwise L{i,j) = 0, i, j = 1,2, . . . , N; 

L be a row normalized matrix of L; 

e be a vector of all Is, and w be a vector of probabilities 
that sum to one; and 



be a stochastic matrix such that S 



^N), 



where a.^ = 1 if document i is dangling (i.e., document i has 
no forward link) and otherwise. 

The transition probability matrix used by PageRank is 

G = cS + {1 - c)ev'^ , 

where v (often called the teleportation vector) is a proba- 
bility vector that sums to one. Matrix G is sometimes called 
the Google matrix in literature lEsIl . One merit of the Google 
matrix is that it is stochastic and primitive and thus its steady 
state distribution (also called the stationary distribution) ex- 
ists. In fact, PageRank (denoted by tt) is exactly the steady 
state distribution vector of G, satisfying 



G' 



The other merit of G is that it does not have to be stored, and 
the power iteration of computing tt can take advantage of the 
rank-1 matrix ev^ , manipulating 5*, c, e and v directly, e.g., 
see 1 12]. 

Considerable efforts have been devoted to the computa- 
tion problem of PageRank due to its large scale applications. 
This is especially important when one wants to compute mul- 
tiple PageRank vectors depending on queries and users. For 
a detailed discussion, please refer to Section |5] In this pa- 
per, we present a method utilizing the best backlinks, which 



have a much faster convergence than PageRank but the per- 
formance is still comparable. We are interested in under- 
standing the roles of these influential links and their implica- 
tion for link analysis algorithms especially PageRank. 

2 Research Questions 

Link analysis takes advantage of the linking information in 
calculating document importances. For example, Pagerank 
uses the back links of a document in updating its score. 
Intuitively, there are influential links which contribute a large 
portion to the score, and there are unimportant links which 
only contribute a negligible portion. We would like to ask 
the following questions. 

• Where are the influential links from? What types 
of documents are the influential sources? Are they 
authorities, hubs, or anything else? What relations are 
they to the nodes that are influenced by them? 

• How many such influential sources are there? 

• How influential is a backlink to the score of a docu- 
ment? Most importantly, how influential are those in- 
fluential back links? 

These questions are interesting for all link analysis 
algorithms. In this paper, we will be dealing with PageRank. 
By answering these questions, we wish to gain insights 
into the connectivity of large, real-world graphs and the 
quality of documents, and provide a better ranking. A result 
of this study is a ranking method that takes advantage of 
the most influential back links to discover authorities and 
communities. 

3 The Best Back Links and MaxRank 

In the case of PageRank-style authority discovery, a natural 
definition of the best back link of a page is the one with the 
largest score. 

We discover the best back links in the same process of 
authority score update, giving a so-called MaxRank method. 
The basic idea of this algorithm is, with probability A, the 
contributing score comes from the best backlink of the page; 
with probability 1 — A, the contributing scores come from a 
random backlink of the page. 

3.1 The Algorithm In particular, MaxRank of a page j 
(j = 1,2,..., TV) is defined by 



where 



(3.1) 



are; max R(i) 



A G [0, 1], B{j) is the set of backlink pages of page j, and 
P{i,j) is the probability of going from page i to page j, 

3.2 Convergence of MaxRank In this section, we first 
give a theorem showing that both variants of MaxRank are 
well defined. A straightforward application of this theorem 
is that power iteration of computing MaxRank is guranteed 
to converge for A € [0, 1]. 

Theorem 1. For c e (0, 1) and A e [0, 1], MaxRank is 
well defined. 

Proof. For notational convenience, we define 



\Pii*,j) max R{i) + (1 - A) V P{i,j)Rii) 



Accordingly, we have 

Rij) = cT{R,j) + il^c)vU) 

j — 1,2, . . . , N. In matrix form, we have 

(3.2) R = cT{R) + (1 - c)v, 

where T{R) is a vector, with T{R){j) = T(R,j), j = 
1,2,. ..,7V. 

Next we are to prove that T{R) is a non-expansion 
operator with respect to the 1-norm, which means that 



(3.3) 



l|T(i?)||i<||i?||i 



According to the definition of T(i?), T{R){j) and T{R,j), 
we have 

T(i?) ^T R, 

where T is a TV x iV matrix, with T{j,i) = P{i,j), if 
page i is the best backlink of page j; otherwise T{j. i) — 

(l-A)P(z,j)- 

Then the inequality (13.31 ) can be proven in the following 
steps: 

l|T(i?)||i < ||r||i||i?||i 

N 



N 



< max P(i,j)\\R\\i 
- i=1.2,...,N ^ ^ -^'^ " 



= ll^lli 



For the third equation, the equality holds when i is the best 
backlink for all pages. 

Thus T is a non-expansion mapping in 1-norm. Accord- 
ing to equation (13.21 ). R is defined by a contraction mapping 
composed of T and c. Hence R is finite. 



The definition of MaxRank enables straightforward esti- 
mation using power iteration starting from any initial guess. 
The convergence of power iteration is guaranteed following 
an argument similar to Theorem[T] 

Theorem 2. (Convergence) For c e (0,1) and X e 
[0, 1], power iteration of solving MaxRank converges to the 
true vector defined in ( I3.7I ), irrespective of any initial vector 

We will consider the random surfer in the remainder of 
this paper. That is, the probability of going from a (non- 
dangling) page « to a page j is 1 /rii, where rii is the number 
of (forward) links on page i. In this case, it is noticeable 
that when A = 0, the algorithm reduces to PageRank. When 
A = 1, the algorithm only considers the "best" backlink it 
finds and ignores the contribution from the others. However, 
in our experience, this usually gives poor ranking results 
because the selected best backlink pages are usually not good 
in quality. 

4 Empirical Results 

In this section, we study the proposed algorithm and ques- 
tions on the Wikipedia English article dump, which contains 
about 6 million pages (articles or categories). For all algo- 
rithms, c = 0.85 was used. The teleportation probabilities 
were uniformly set to 1/iV. All algorithms are updated by 
the standard power iteration. No sophisticated update is used 
for any algorithm. 

Recall that we would like to study the following ques- 
tions. What are the sources of the best back links? How many 
are they? How infiuential are they? For space limitation we 
show only the case of A = 0.1 in this paper 

4.1 Sources of the Best Back links Table |3] shows the 
sources of the best backlinks for the top-50 pages on 
Wikipedia, using algorithm MaxRankeii with A = 0.1. Note 
that this choice produces a similar scoring to PageRank, as 
will be shown later. The sources of the best backlinks are 
mostly global authorities. The very top pages are seen to 
support many top pages. For example, "United States" in- 
fluences many other concepts which further influence the re- 
maining of the site. The effect is that this classifies the site 
into clusters of nodes, in each of which there are only a small 
number of dominant nodes. 

There are only 775,438 unique backlink sources with 
MaxRank. They support the whole site and form a core. The 
size of this core is only about 0.7% of the total number of 
links (117, 864, 053), and about 13.5% of the total number of 
the pages (5, 743, 047). On average a core page "supports" 
about 3,620,343/775,438 w 4.7 pages. This is also an 
estimate of the average size of the clusters. The size of 
the best backlink core for various A is shown in Figure [T] 
A = 0.3 leads to the smallest core for this example. For A 
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Figure 1: Number of best backlink sources on 
Wikipedia according to MaxRanked with A = 
0,0.1,0.3,0.5,0.7,0.9,0.99. 



larger than 0.7, the core size is much larger and increases 
much quicker with respect to A. 

4.2 Influence of the Best Backlink Sources We measure 
the influence of the best backlink sources in three distinct 
aspects. The first measure is the collective influence of the 
best backlink sources in the graph, which is defined as the 
ratio of the sum of the scores of all the best backlink sources 
over the sum of the scores of all the pages. The collective 
influence of the core is 53.1%. Note that the number of core 
pages is only about 13.5% of the whole graph. Thus the 
influence of the core is significant. 

Different core sources have different strength of influ- 
ence. Some contribute many best backlinks, while others 
only contribute a few. Thus this suggests a measure for 
influential sources, in particular, by the times of being the 
best back link (TBB) to other nodes. Note that the TBB of 
an influential page is equal to the number of pages that the 
page supports. Table[T]shows the ordering of the sources ac- 
cording to the TBB measure. In addition, we also show in 
this table the ratio of TBB to the out-degree of the sources, 
which measures the percentage of competitive links cast by 
the sources. In this top list we see many hubs and authorities, 
and the number of hubs is more than the number of author- 
ities. Thus on Wikipedia the more links an article has the 
more likely it is influential to others. 

A log-log plot of the distributions of the out-degree and 
the TBB is shown in the left plot of Figure [3] Some key 
observations are as follows. First, the number of pages that 
have been the best backlink only a few times is very large, 
while the number of pages that have been the best backlink 
many times is very small. Second, the log-log curve of 
TBB distribution is more straight, which means the TBB 
distribution follows an exponential distribution in a more 
strict way. Third, it can be seen that for x > 10, the two 
curves follow a similar exponential distribution with a close 
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Figure 2: Left to Right; The TBB of the top 1-100, 101-1000, 1001-10000 authorities on Wikipedia (A = 0.1). 



exponent, with a large shift in the x direction which indicates 
that the TBB of a page is much smaller than the out-degree 
of the same page. 

The (sorted) ratio between the TBB and the out-degree 
for each core page is shown in the middle plot of Figure [3] 
First, the sources whose value of the ratio is smaller than 0.2 
are about 66% of the total core. This means the majority of 
the core has only 20% of their links being the best backlinks. 
Second, the number of those pages whose ratio is equal to 
1.0 is about 87, 193 (11% of the total core). Astonishingly, 
86, 763 (99.5%) of them have only one link. This sheds 
lights on the structure of Wikipedia. Most of them are due 
to the existence of "redirect pages" in Wikipedia, which 
contains no content but a "link" to another article. Third, 
the remaining sources have a ratio larger than 0.2. Together 
with the nontrivial sources whose ratio is 1.0, they form the 
most competitive link sources of the core. They take about 
23% of the total core. Of them, only 4, 632 sources have a 
ratio larger than 0.5, and 360 sources have a ratio larger than 
0.8. In short, the number of nontrivial, competitive backlink 
sources is very small. 

The third is from the perspective of an ordinary page 
(either in the core or not in the core), a measure of being 
influenced by the best back link, by the ratio of the score 
contributed by the best back link over the overall score of 
the page. We expect this measure can distinguish authorities. 
This ratio for all the pages is shown in the right plot of 
Figure |3] For authorities with high scores, this ratio is very 
small. Thus they are not easily influenced even by the best 
backlink source. As pages become less authoritative (along 
the negative direction of the x-axis), the values of this ratio 
become more diverse. For example, we can observe the 
values of this ratio cover almost the whole range of (0,1) 
for pages with a score equal to 10^^. 

Figure |2] shows the TBB versus the out-degree for the 
top authorities. For the very top-100 authorities, the curve 
is almost linear, and very close to y — x. (Note that all 
points are below y — x.) Thus their links are very influential. 
Further down the ordering of the authorities, we can observe 
that there are more and more less influential pages. 



4.3 Convergence Studies Figure H] (Left) compares the 
convergence rates of MaxRank and PageRank, measured in 
terms of the (1-norm) errors between successive iterations. 
MaxRank is faster than PageRank. The advantage is very 
significant for large A. MaxRank with A = 0.1 needs about 
20 iterations to reach the accuracy by PageRank at the 30th 
iteration, while with A = 0.9 MaxRank only needs 3 or 4 
iterations. 

4.4 Performance of MaxRank We compared the top 
list for the three algorithms, since it is usually the most 
important in practice. Table |2] shows the top 50 pages 
by PageRank (MaxRank with A = 0). Table |3] Table 
m and Table |5] show the top results of MaxRank with 
A = 0.1,0.5,0.9. We also tested A = 1 for MaxRank, 
but the results were very poor The intuition is that the 
found "best backlinks" are not good without considering 
the wisdom of the majority. Note that in the tables, "ISBN 
is short for "International_Standard_Book_Number", 
and "Inter-Air- Trans-code" is short for "Interna- 
tional_Air_Transport_Association_airport_code". 

The top lists of these algorithms have some similarities, 
and also some differences. In order to measure the similarity 
between the algorithms, we performed comparisons using 
two measurements. One is the percentage of common pages 
in the top-fc lists by two algorithms, 

# Common pages in top-fc 

cfc = ^ e [0, 1]. 

The other is Kendall's tau coefficient which measures the 
correlation in two rankings ifioll . Here we care about whether 
MaxRank ranks the top-fc pages of PageRank in a consistent 
manner to PageRank, so the measure used is 

rfe = ge[0,l], 

where Uk is the number of concordant orderings for every 
two pages from the top-fc pages of PageRank. The results 
of Cfc are summarized in the middle plot of Figure |4] for 
k =5, 10, 30, 50, 80, 100, 300, 500, 800, 1000. Notice 




Figure 3: Left: Distributions of TBB and out-degree for the best backlink sources on Wikipedia. Middle: The sorted ratio 
between TBB and out-degree for the best backlink sources. Right: The ratio of the score being influenced by the best 
backlink source for all pages (with a nonzero number of backlinks). A = 0.1. 



that MaxRank performs remarkably similarly to PageRank 
for A = 0.1, due to that the effect of the best backlinks is 
made small. In general, the smaller A is, the more similar 
ranking of MaxRank to that of PageRank. 

The results of are summarized in the right plot of 
Figure|4] Similarly, the smaller the parameter A is, the more 
similar the ranking is to PageRank. In particular, MaxRank 
with A = 0.1 has a very similar Tk to PageRank for all 
k. For large A like 0.9 and 0.99, MaxRank still has about 
80% similarities on average, and 65% similarities at worst to 
PageRank. The difference between the orderings of 0.9 and 
0.99 for MaxRank is relatively small for all k. This suggests 
that increasing A to large values close to 1 produces stable 
rankings. 

5 Discussion 

The size of the Web creates a large computation burden 
for PageRank. Currently most large commercial search 
engines index 10 to 100 billion pages. However, the Web 
is actually much larger, e.g., there were already 1 trillion 
unique URLs in 2008 according to Google. Q Computing 
a single, global PageRank for the Web is already very 
demanding. Page et. al. used power iteration to take 
advantage of the sparse nature of the link structure of the 
Web II25II . Kamvar et al. proposed an adaptive method which 
monitors the change in the PageRank update for each page, 
and removes those pages whose update no longer changes 
iflill . Methods proposed in | IT | , 1 17 1 , and ^ take advantage 
of the structure of link matrices and compute PageRank 
block-wise. Other linear system solvers were also used 
to update PageRank. For example, Kamvar et. al. used 
extrapolation methods 1 1 811 : G leich et. al. proposed an inner- 
outer iteration procedure |l9l which essentially applies 
preconditioning incrementally; and Langville and Meyer, 
and Ipsen and Kirklad studied aggregation/disaggregation 



'http://googleblog.blogspot.com/2008/07/we-knew-the-Web-was- 
big.html 



methods 1231 [ij. 

It becomes more severe when one wants to computes 
many score vectors, such as many personalized PageRank 
125111511 . context-sensive or query-dependent scores 126i,l27, 
11311 . which has numerous applications in search systems. Jeh 
and Widom proposed a scalable method by pre-computing 
some components of PageRank and saving them for efficient 
future computation ifisll . Fogaras et. al. simulated a number 
of random walks and used Monte-Carlo methods to estimate 
personalized PageRank vectors [6]. 

The computation of PageRank is very demanding. Thus 
distributed, parallel computation becomes necessary for 
large graphs. The methods in 21, [30I1 partition the 
whole graph into disjoint subgraphs, and then compute lo- 
cal PageRank for each subgraph. The local PageRanks are 
then merged, considering the links between the subgraphs. 
The methods in 1^ 131 feature in the use of advanced lin- 
ear system solvers. Some researchers also considered effi- 
cient hardware structures, such as specially optimized cir- 
cuits l[29[l. For excellent surveys of PageRank, please refer 

toiliililllllilli. 

Our method is very different from these efforts in lit- 
erature, though it should be noted that these techniques 
also apply to our algorithms in a straightforward way. Our 
work takes advantage of the influential links in updating 
PageRank-style scores. We hope by doing so one can gain 
speedup in convergence and the performance is similar or 
comparable to PageRank. 

6 Conclusion 

The observation leading to this paper is that there exists one 
or more valuable backlinks for a page with a nonzero number 
of backlinks. We show that by leveraging the best backlinks 
a recursive update can have a much faster convergence than 
PageRank. The algorithm has a parameter A G [0, 1], which 
controls the effects of the best backlinks discovered. When 
A = 0, the algorithm reduces to PageRank. Empirical 



Table 1: The ordering of the best backlink sources according to the "Times of being the Best Backlinks" (TBB). A = 0.1. 
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0.000186 


46 


List_of_bird_genera 


788 


1929 


0.408502 


0.000005 


47 


List_of_subj ectsJn_Gray s_Anatomy : _X1. Splanchnology 


786 


1116 


0.704301 


0.000046 


48 


List_of_S tate_Routes Jn_New_York 


111 


985 


0.783756 


0.000025 


49 


Italy 


162 


1310 


0.581679 


0.001633 


50 


LisLoOCF_Canoe_Sprint_World_Championships_niedalistsJn_tnen^_kayak 


761 


865 


0.879769 


0.000007 



results show that with large As (but smaller than 1) the new 
algorithm converges dramatically faster, but the results still 
have 80% similarities to PageRank on average (measured 
with Kendall's tau). Thus our algorithm is advantageous 
for ranking in large search systems, where the computation 
of many personalized, query-dependent or context-sensitive 
score vectors is demanding. 

Results on Wikipedia show that the number of unique 
best backlink sources (the so-called "core" in the paper) is 
only about 13.5% of the total number of pages. However, 
the sum of their scores is more than a half (about 53.1%) 
of the total scores. We propose to measure a source in the 
core by the times of being the best backlinks (TBB) and 
the ratio between TBB and the out-degree. Results show 



that TBB follows an exponential distribution with a similar 
exponent to the distribution of the out-degrees. With these 
two measures, the number of competitive backlink sources 
is very small. Results also show that a top authority is not 
easily influenced by the best backlink source. 
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Figure 4: Left: convergence rate comparisons of MaxRank and PageRank. Middle and Right: Percentage of the common 
pages and Kendall's tau for the top-A: lists of MaxRank and PageRank. 
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Table 2: Top 50 Wikipedia pages by MaxRank(0) (PageRank). 



Table 3: Top 50 Wikipedia pages by MaxRank, A = 0.1. 



Rank 


Page 


Score 


Best backlink 


1 


United_States 


0.013911 


"ISBN" 


2 


"ISBN" 


0.007283 


United_States 


3 


United_Kingdom 


0.006135 


United-States 


4 


Wikimedia_Commons 


0.005986 


Wiktionary 


5 


Wiktionary 


0.004151 


Wikimedia_Commons 


6 


France 


0.004081 


United-States 


7 


Canada 


0.004049 


United-States 


8 


Biography 


0.003964 


Wiki 


9 


Germany 


0.003860 


United_States 


10 


England 


0.003766 


United_States 


11 


BiologicaLclassification 


0.003562 


Arthropod 


12 


EnglishJanguage 


0.003522 


United_S tales 


13 


Australia 


0.003426 


United_States 


14 


World.WarJI 


0.003186 


Umted_States 


15 


BinomiaLnomenclature 


0.003176 


Biological-classiflcation 


16 


Japan 


0.003091 


United-States 


17 


India 


0.003026 


United_States 


18 


lnternet_Movie_Database 


0.002907 


Alexajntemet 


19 


Abbreviation 


0.002882 


USA 


20 


IVliisic.genre 


0.002844 


Poland 


21 


Association_football 


0.002766 


United-States 


22 


Europe 


0.002751 


United-States 


23 


RecordJabel 


0.002734 


Musicgenre 


24 


Italy 


0.002612 


United_States 


25 


2007 


0.002505 


Australia 


26 


Russia 


0.002339 


United_States 


27 


London 


0.002152 


United_Kingdom 


28 


Spain 


0.002150 


United_States 


29 


Latin 


0.002077 


United_States 


30 


2006 


0.002030 


Germany 


31 


Personaljiame 


0.001989 


Givenjiame 


32 


2008 


0.001919 


Germany 


33 


New_York_City 


0.001850 


United-States 


34 


Netherlands 


0.001841 


United_States 


35 


Poland 


0.001829 


United-States 


36 


Sweden 


0.001825 


United-States 


37 


Scientific_nanie 


0.001752 


BiologicaLclassification 


38 


Pubbc.domain 


0.001752 


Wikimedia_Commons 


39 


Brazil 


0.001663 


United_S tales 


40 


Time_zone 


0.001663 


United-States 


41 


China 


0.001658 


World-WarJI 


42 


FrenchJanguage 


0.001651 


United-States 


43 


World-WarJ 


0.001643 


United-States 


44 


Catholic.Church 


0.001623 


France 


45 


California 


0.001620 


United.States 


46 


New_Zealand 


0.001592 


United_States 


47 


Area 


0.001569 


United_States 


48 


2005 


0.001559 


France 


49 


New.York 


0.001554 


United-States 


50 


GermanJanguage 


0.001515 


United-States 



Rank 


Page 


Score 


Best backlink 


1 


United_States 


0.009093 


"ISBN" 


2 


"ISBN" 


0.004404 


United.States 


3 


United_Kingdom 


0.003863 


United.States 


4 


Wikimedia_Commons 


0.003614 


Wiktionary 


5 


Biography 


0.003035 


Wiki 


6 


Biological_classification 


0.002773 


Arthropod 


7 


Canada 


0.002626 


United.States 


8 


France 


0.002546 


United.States 


9 


Wiktionary 


0.002474 


Wikimedia.Commons 


10 


England 


0.002462 


United.S tales 


11 


Germany 


0.002452 


United.States 


12 


B inomiaLnomenclature 


0.002273 


BiologicaLclassification 


13 


Australia 


0.002203 


United.States 


14 


EnglishJanguage 


0.002172 


United.S tales 


15 


Musicgenre 


0.002130 


Poland 


16 


RecordJabel 


0.002054 


Musicgenre 


17 


Intemet_MovieJ)atabase 


0.002036 


Alexa.lntemet 


18 


Japan 


0.002021 


United.States 


19 


India 


0.001968 


United.S tales 


20 


World.War.il 


0.001946 


United.States 


21 


Association Jootball 


0.001910 


United.States 


22 


Abbreviation 


0.001786 


USA 


23 


Europe 


0.001690 


Umted.States 


24 


2007 


0.001658 


Australia 


25 


Italy 


0.001633 


United.States 


26 


PersonaLname 


0.001559 


Givenjiame 


27 


Russia 


0.001464 


United.S tales 


28 


London 


0.001363 


United.Kingdom 


29 


Spain 


0.001339 


United.States 


30 


2006 


0.001322 


Germany 


31 


2008 


0.001274 


Germany 


32 


Scientiflcjiame 


0.001236 


Biological.cIassiflcation 


33 


Poland 


0.001206 


Umted.States 


34 


New.York.City 


0.001183 


United.States 


35 


Sweden 


0.001155 


United.States 


36 


Latin 


0.001140 


United.States 


37 


Netherlands 


0.001136 


United.States 


38 


Public.domain 


0.001120 


Wikimedia.Commons 


39 


Timczone 


0.001077 


United.S tales 


40 


Brazil 


0.001076 


Umted.States 


41 


California 


0.001055 


United.States 


42 


Record.producer 


0.001024 


Musicgenre 


43 


China 


0.001023 


Japan 


44 


New.Zealand 


0.001007 


United.States 


45 


2005 


0.001006 


France 


46 


World.WarJ 


0.001004 


United.States 


47 


New.York 


0.000999 


United.States 


48 


Romania 


0.000968 


United.S tales 


49 


Area 


0.000966 


United.States 


50 


PoUtician 


0.000964 


Videcgame 



Table 4: Top 50 Wikipedia pages by MaxRank, A = 0.5. 



Table 5: Top 50 Wikipedia pages by MaxRank, A = 0.9. 



Rank 


Page 


Score 


Best backlink 


Rank 


Page 


Score 


Best backlink 


1 


United_States 


0.002444 


"ISBN" 


1 


United.States 


0.0003 1 1 


United.Kingdom 


2 


Biography 


0.001198 


Genre 


2 


Biography 


0.000194 


Autobiography 


3 


BiologicaLclassificadon 


0.001103 


Arthropod 


3 


BiologicaLclassification 


0.000175 


Arthropod 


4 


"ISBN" 


0.000949 


United_States 


4 


Musicgenre 


0.000111 


Record.producer 


5 


United_Kingdom 


0.000929 


United_States 


5 


United.Kingdom 


0.000109 


United.States 


6 


Musicgenre 


0.000753 


Record-producer 


6 


RecordJabel 


0.000106 


Musicgenre 


7 


RecordJabel 


0.000726 


Musicgenre 


7 


Personaljiame 


0.000105 


Givenjiame 


8 


Wikimedia_Commons 


0.000712 


Association_footbaII 


8 


"ISBN" 


0.000099 


United.States 


9 


Canada 


0.000685 


United_States 


9 


England 


0.000088 


United.States 


10 


England 


0.000671 


United_States 


10 


Internet.MovieJ])atabase 


0.000087 


RoyaLNavy 


11 


BinomiaLnonienclature 


0.000656 


BiologicaLclassification 


11 


Canada 


0.000085 


United.States 


12 


PersonaLname 


0.000640 


Given_name 


12 


BinomiaLnomenclature 


0.000079 


BiologicaLclassification 


13 


Internet_Movie_Database 


0.000630 


Royal_Navy 


13 


Arthropod 


0.000075 


Lepidoptera 


14 


Germany 


0.000607 


United_States 


14 


India 


0.000073 


United.States 


15 


France 


0.000603 


United_States 


15 


Germany 


0.000072 


United.States 


16 


Australia 


0.000558 


United_States 


16 


France 


0.000069 


United.States 


17 


India 


0.000543 


United_States 


17 


AusttaUa 


0.000067 


United.States 


18 


AssociationJbotball 


0.000533 


United_States 


18 


AssociationJootball 


0.000063 


United.States 


19 


Japan 


0.000523 


United_States 


19 


Japan 


0.000062 


United.States 


20 


Wiktionary 


0.000510 


Wikimedia_Commons 


20 


Wikimedia.Conmions 


0.000061 


Arthropod 


21 


EnglishJanguage 


0.000502 


United-States 


21 


Politician 


0.000059 


Video .game 


22 


2007 


0.000454 


Australia 


22 


Studicalbum 


0.000058 


Musicgenre 


23 


Abbreviation 


0.000436 


USA 


23 


Abbreviation 


0.000058 


USA 


24 


Arthropod 


0.000423 


Lepidoptera 


24 


EnglishJanguage 


0.000057 


Umted.States 


25 


World_WarJI 


0.000415 


United-States 


25 


2007 


0.000057 


Australia 


26 


Italy 


0.000392 


United-States 


26 


Record.producer 


0.000054 


Musicgenre 


27 


Europe 


0.000373 


United-States 


27 


Wiktionary 


0.000052 


Wikimedia.Commons 


28 


Studio_album 


0.000370 


Musicgenre 


28 


Geocode 


0.000048 


UN/LOCODE 


29 


Politician 


0.000364 


Video_game 


29 


UN/LOCODE 


0.000047 


"Inter- Air- Trans-code" 


30 


Record_producer 


0.000363 


Musicgenre 


30 


Italy 


0.000046 


United.States 


31 


2008 


0.000349 


Germany 


31 


2008 


0.000044 


Germany 


32 


Russia 


0.000343 


United-States 


32 


Romania 


0.000044 


United.States 


33 


2006 


0.000340 


Germany 


33 


Lepidoptera 


0.000042 


Moth 


34 


London 


0.000332 


United_Kingdom 


34 


World.WarJI 


0.000042 


United.States 


35 


Scientificjiame 


0.000326 


BiologicaLclassification 


35 


2006 


0.000040 


Germany 


36 


Poland 


0.000318 


United_States 


36 


Drainagcbasin 


0.000039 


New.York.City 


37 


Spain 


0.000313 


United_States 


37 


London 


0.000039 


United.Kingdom 


38 


Romania 


0.000310 


Unite d_States 


38 


Television 


0.000039 


United.Kingdom 


39 


New_York_City 


0.000294 


United_States 


39 


Europe 


0.000039 


United.S tales 


40 


Public.domain 


0.000293 


Wikimedia_Conmions 


40 


Poland 


0.000038 


Umted.States 


41 


Time_zone 


0.000288 


United-States 


41 


Time .zone 


0.000038 


United.States 


42 


Brazil 


0.000282 


United-States 


42 


Russia 


0.000038 


United.States 


43 


Sweden 


0.000280 


United_States 


43 


Genus 


0.000037 


Biological.cIassification 


44 


California 


0.000278 


United_States 


44 


Conservation.status 


0.000036 


lUCN.RedXist 


45 


Television 


0.000270 


United_Kingdom 


45 


Spain 


0.000036 


United.States 


46 


Drainage-basin 


0.000267 


New.York.City 


46 


Public.domain 


0.000035 


Wikimedia.Commons 


47 


Netherlands 


0.000259 


United.States 


47 


Brazil 


0.000035 


United.States 


48 


2005 


0.000255 


France 


48 


New.York.City 


0.000035 


United.States 


49 


New.York 


0.000249 


United.States 


49 


California 


0.000035 


Umted.States 


50 


New^ealand 


0.000249 


United.States 


50 


lUCNJlediist 


0.000034 


Conservationjstatus 



